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Abstract — The design of the precoder the maximizes the 
mutual information in linear vector Gaussian channels with an 
arbitrary input distribution is studied. Precisely, the precoder 
optimal left singular vectors and singular values are derived. The 
characterization of the right singular vectors is left, in general, 
as an open problem whose computational complexity is then 
studied in three cases: Gaussian signaling, low SNR, and high 
SNR. For the Gaussian signaling case and the low SNR regime, 
the dependence of the mutual information on the right singular 
vectors vanishes, making the optimal precoder design problem 
easy to solve. In the high SNR regime, however, the dependence 
on the right singular vectors cannot be avoided and we show 
the difficulty of computing the optimal precoder through an NP- 
hardness analysis. 

I. Introduction 

In linear vector Gaussian channels with an average power 
constraint, capacity is achieved by zero-mean Gaussian inputs, 
whose covariance is aligned with the channel eigenmodes 
and where the power is distributed among the covariance 
eigenvalues according to the waterfilling policy [1], [2]. Des- 
pite the information theoretic optimality of Gaussian inputs, 
they are seldom used in practice due to their implementation 
complexity. Rather, system designers often resort to simple 
discrete constellations, such as BPSK or QAM. 

In this context, the scalar relationship between mutual 
information and minimum mean square error (MMSE) for 
linear vector Gaussian channels put forth recently in [3], and 
extended to the vector case in [4], has become a fundamental 
tool in transmitter design beyond the Gaussian signaling case. 

In [5], the authors derived the optimum diagonal precoder, 
or power allocation, in quasi-closed form, coining the term 
mercury /waterfilling. Their results were found for the partic- 
ular case of a diagonal channel corrupted with AWGN and 
imposing independence on the components of the input vector. 
The mercury/waterfilling policy was later extended to non- 
diagonal channels in [6] through a numerical algorithm. 

The linear transmitter design (or linear precoding) problem 
was recently studied in [7], [8] with a wider scope by 
considering full (non-diagonal) precoder and channel matrices 
and arbitrary inputs with possibly dependent components. In 
[7], [8] the authors gave necessary conditions for the optimal 
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precoder and optimal transmit covariance matrix and proposed 
numerical iterative methods to compute a (in general subop- 
timal) solution. Despite all these research efforts, a general 
solution for this problem is still missing. In this work, we make 
a step towards the characterization of its solution and give 
some hints and ideas on why this problem is so challenging. 
The contributions of the present paper are: 

1) The expression for the optimal left singular vector 
matrix of the precoder that maximizes a wide family of 
objective functions (including the mutual information) 
is given. 

2) We give a necessary and sufficient condition for the 
optimal singular values of the precoder that maximizes 
the mutual information and propose an efficient method 
to numerically compute it. 

3) We show that the dependence of the mutual information 
on the right singular vector matrix of the precoder is 
a key element in the intractability of computing the 
precoder that maximizes the mutual information. 

4) We give an expression for the Jacobian of the mutual 
information with respect to the transmitted signal co- 
variance, correcting the expression in [4, Eq. (24)]. 

Formalism: In this work we define a program according to 

{/o > x m] = Name ■ • • >Op) 

:= max/min/ (a;i, . . . ,x m ,ai, ...,a p ) (1) 
subject to fi(xi, . . . , x m , ax,..., a p ) < 0, Vi, 

where (ai, ■ • ■ , a p ) are the parameters and (x\, . . . , x m ) are 
the optimization variables. Observe that the first returned 
argument, /q , corresponds to the optimal value of the objective 
function. We also make use of the Jacobian operator D applied 
to a matrix valued function F of a matrix argument X defined 
as D X F = (<9vecF)/(<9vec T X) [9, Sec. 9.4], where vecX is 
the vector obtained stacking the columns of X. This notation 
requires some modifications when either F or X are symmetric 
matrices, see [9] for details. In Section VI we use some 
concepts of computational complexity and program reductions. 
See [10], [11] for reference. 



II. Signal model 

We consider a general discrete-time linear vector Gaussian 
channel, whose output Y e W 1 is represented by the following 
signal model 

Y = HPS + Z, (2) 

where S € R m is the input vector distributed according to 
P S (s), the matrices H e R nxp and P e R pxm represent the 
channel and precoder linear transformations, respectively, and 
2el" represents a zero-mean Gaussian noise with identity 
covariance matrix = I 1 . 

For the sake of simplicity, we assume that E {S} = 
and EjSS 17 } = I. The transmitted power p is thus given 
by p — Tr(PP T ). We will also make use of the notation 
P = y/pP, with Tr (PP T ) = 1 and also define R H = H T H. 
Moreover, we define the SVD decomposition of the precoder 
as P = UpSpVj, the entries of S P as cr^ = [S P ]jj, 
and also the eigendecomposition of the channel covariance 
as R H = UhAhUJj- Finally, we define the MMSE matrix 
as E s = E{(S - E {S \ Y})(S -E{S\ Y}) T }. 

III. Problem definition and structure of the 

SOLUTION 

In this paper we are interested in studying the properties 
of the precoder P that maximizes the mutual information 
under an average transmitted power constraint. However, in 
this section we consider the more generic problem setup 

{7> *, P£J = MaxPerf ormace (p, P s (s), R H ) 

:= max Vq (3) 
p 

s.t. Tr (PP T ) = p, 

where Vo is a generic performance measure that depends on 
the precoder P through the received vector Y. 

In the following lemma we characterize the dependence of 
Vo on the precoder matrix P. 

Lemma 1: Consider a performance measure Vo of the sys- 
tem Y = UPS + Z, such that Vo depends on the distribution 
of the random observation Y conditioned on the input S. It 
then follows that the dependence of Vo on the precoder P is 
only through P t RrP and we can thus write without loss of 
generality V = -P (P T RhP). 

Proof: The proof follows quite easily by noting that 
P T H T 1^ is a sufficient statistic of Y, [2, Section 2.10]. The 
sufficient statistic is thus P T H T HPS' + P T tt T Z. The first 
term obviously depends on P only through P t RhP. Since 
the second term P T U T Z is a Gaussian random vector, its 
behavior is completely determined by its mean (assumed zero) 
and its covariance matrix, given by P T R H P. ■ 

From all the possible choices for the performance measure 
function Vo, we are now going to focus our attention on the 
specific class of reasonable performance measures, which is 
defined next. 

'The assumption Rz = I is made w.l.o.g., as, for the case Rz I, we 
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could always consider the whitened received signal R z Y. 



Definition 1: A performance measure "Po(P T RhP) is 
said to be reasonable if it fulfills that P (o;P t RhP) > 
P (P T RhP), for any a > 1 and P T R H P ^ 0, which 
implies that Vq is a power efficient performance measure. 

Remark 1: The generic cost function 2 f considered in [12] 
was assumed to be a function of the elements of the vector 
diag((I + P t RhP) _1 ) and increasing in each argument. 
Recalling that, for any a > 1 and P t RhP ^ 0, we have 

[diag((I + aP T R H P)" 1 )] l < [diag((I + P T R H P)~ 1 )] r 

(4) 

It is straightforward to see that the performance measure 
defined as Vo — —fo is a reasonable performance measure 
according to Definition 1. 

Based on a result in [12] for the design of optimal linear 
precoders, we characterize the left singular vectors of an 
optimal precoder of (3). 

Proposition 1: Consider the optimization problem in (3). It 
then follows that, for any reasonable performance measure Vo, 
the left singular vectors of the optimal precoder P e R pxm 
can always be chosen to coincide with the eigenvectors of the 
channel covariance Rh associated with the min{p, to} largest 
eigenvalues. 

Proof: For simplicity we consider the case m > p. The 
case to < p follows similarly. From the SVD of the precoder 
P = UpSpVp and the eigen-decomposition of the matrix 

SpUpR H UpSp = QAQ T , (5) 

with A diagonal and Q orthonormal, it follows that 

Q T S P UpR H UpSpQ (6) 

is a diagonal matrix. From [12, Lemma 12], we can state that 
there exists a matrix M = Uh^m, with Em having non-zero 
elements only in the main diagonal, such that M t RhM = A 
and that Tr(MM T ) < Tr(S P ) = Tr(PP T ). Now, we only 
need to check that 

P T R H P = V P QAQ T Vp = V P QM T R H MQ T Vp. 

Defining P = MQ T V p = U H S M V T , with V = V P Q, we 
have shown by construction that for any given matrix P we 
can find another matrix P such that the objective function in 
(3) is the same, 

P T R H P = P T R H P ?o(P T R H P) = 7> (P T R H P), 

(7) 

which follows from Lemma 1, whereas the required transmit- 
ted power is not larger, Tr(PP T ) = Tr(MM T ) < Tr(PP T ). 
Since the performance measure Vo is reasonable, the result 
follows directly. ■ 
From the result in Proposition 1, it follows that, the channel 
model in (2) can be simplified, without loss of optimality, to 

Y' = A H £ P VpS + Z, (8) 

where now the only optimization variables are Sp and Vp. 

2 Observe that, while a performance measure Vo is to be maximized, a cost 
function fo is usually to be minimized. 



IV. Optimal singular values 

In this section we particularize the generic performance 
measure considered in the previous section to the input-output 
mutual information in (8), i.e., Vo = I(S;Y'). To compute 
the optimal S P we define 

{/*, Sp } = DptPowerAlloc (p, P s (s), Ah, Vp) 

:= max I(S;Y') (9) 

r 2 



S.t. 



p. 



Observe that the optimization is done with respect to the 
optimal squared singular values. The optimal singular values 
are then defined up to a sign, which does not affect the 
mutual information. Consequently, we define a* = +y/of* 
and [Ep} H = <r*. 

Let us now present an appealing property of I(S; Y'). 
Lemma 2 ([13]): Consider the model in (8) and fix Vp. 
Then it follows that the mutual information I(S; Y ) is a 
concave function of the squared diagonal entries of Sp. 
With this result, we can now obtain a necessary and sufficient 
condition for the squared entries of Sp. 

Proposition 2: The entries of the squared singular value 



matrix of* = [Sp*];, of the solution to (9) satisfy 



of = 
of > 



[A^] ii mmse i (Sp,V P ) < 2i] 
[Anlrimmse^Sp, V P ) = 2rj, 



(10) 



where i] is such that the power constraint is satisfied and where 
we have used mmsej(S P , Vp) to define the i-th diagonal 
entry of the MMSE matrix Eg corresponding to the model 
Y' = A H S P S + Z with S = V P S. 

Proof: The proof is based on obtaining the KKT condi- 
tions of the optimization problem in (9) together with 



d/(5;A H Sp5 , + Z) 



A H S* P S 



(11) 



which follows from [4, Cor. 2]. ■ 
Remark 2: The set of non-linear equations in (10) can be 
numerically solved with, e.g., the Newton method because it 
has quadratic convergence and the concavity property stated 
in Lemma 2 guarantees the global optimality of the obtained 
solution. The expression for the entries of the Jacobian vector 
of mmsej(S P , V P ) with respect to the squared entries of S P , 
which is needed at each iteration, is given by [13] 

d mm se i( S P ,Vp) = _ [AiikE{[$( ^ }) 



where = e{§§ 7 



y'}t{s T I y'} 



y'}-E{s 

At this point, we have obtained the optimal left singular 
vectors and the optimal singular values of the linear pre- 
coder that maximizes the mutual information for a fixed 
Vp. Unfortunately, the optimal solution for the right singular 
vectors Vp seems to be an extremely difficult problem. A 



simple suboptimal solution consists in optimizing Vp based 
on standard numerical methods guaranteed to converge to a 
local optimum. See further [14] for details on the practical 
algorithm to compute the precoder. 

From the results presented in this section, it is apparent 
that the difficulty of the problem in (3) when optimizing the 
mutual information lies in the computation of the optimal right 
singular vectors matrix, Vp. To support this statement, in 
the following sections we deal with three cases: the Gaussian 
signaling case, and the low and high SNR regimes. In the 
Gaussian signaling case and low SNR regime, we recover the 
well-known result that the mutual information depends only 
on the squared precoder Qp = PP T and is independent of the 
right singular vectors matrix Vp, which further implies that, 
in both cases, the optimal precoder can be easily computed. 
In Section VI we will show that, for the high SNR regime, 
the precoder design problem becomes computationally difficult 
through a NP-hardness analysis. 

V. Situations where the mutual information is 

INDEPENDENT OF Vp 

1) Gaussian signaling case: For the Gaussian signaling 
case, we recover the well known expression for the mutual 
information [2] 

I(S; Y) = i|ogdet(l + Q P R H ) , (12) 



from which it is clear that the only dependence of the mutual 
information on the precoder is through Qp = UpS P U p 
and, thus, it is independent of V P . As we have pointed out in 
the introduction and generalized in Proposition 1, the optimal 
co variance Qp is aligned with the channel eigenmodes Uh- 
Also the power is distributed among the covariance eigenva- 
lues Sp according to the waterfilling policy [1], which can 
be computed efficiently. 

2) Low SNR regime: For the low SNR regime, a first-order 
expression of the mutual information is [4] 

/(S;r) = iTr(QpR H )+o(||Qp||). (13) 

Just as in the previous case, from this expression it is clear 
that the mutual information is insensitive to the right singular 
vector matrix Vp. Moreover, the optimal matrix Qp is easy 
to obtain in closed form [15] 3 . 

Remark 3: The expression in (13) was derived in [4] 
through the expression of the Jacobian of the mutual infor- 
mation with respect to Q P . Although the result in (13) is 
correct, the expression for the Jacobian Dq p I(S; Y) given in 
[4, Eq. (24)] is only valid in the low SNR regime. The correct 
expression for Dq p I(S;Y) valid for all SNRs is [16] 

D Qp I(S;Y) = ivec T (R H PE s p- 1 )D„ 

- vec T (E s P T R H U P S P )f2N„(p- 1 ® P T )D„, (14) 

3 The optimal signaling strategy in the low SNR regime was studied in full 
generality in [15]. We recall that, in this work, we are assuming that the 
signaling is fixed and the only remaining degree of freedom to maximize the 
mutual information is the precoder matrix P. 



with 



n = 



I Vp((T?I- Sj 



i Vp(ct|I- 



) + V p 



S P ) + VT 



(15) 



V v^Vp^I-SprVp J 



where Vj is the i-th column of matrix Vp, N„ and D„ are the 
symmetrization and duplication matrices defined in [9, Sees. 
3.7, 3.8], A + denotes the Moore-Penrose pseudo-inverse, and 
where for the sake of clarity, we have assumed that P 1 exists 
and that n = m = p. 

VI. High SNR regime 

In this section we consider that the signaling is discrete, 
i.e., the input can only take values from a finite set, S e S = 
{sW}f =1 . As discussed in [5], [8], for discrete inputs and 
high SNR, the maximization of the problem in (3) with the 
mutual information as performance measure is asymptotically 
equivalent to the maximization of the squared minimum dis- 
tance, c? m in 4 , among the received constellation points defined 
as dmin = min ee £ e T P T RnPe, where £ is the set containing 
all the possible differences between the input points in S. 

Consequently, let us begin by considering the optimization 
problem of finding the precoder that maximizes the minimum 
distance among the received constellation points 

{d*,P* d } = MaxMinDist(p, £, H) 

:= max min e T P T R H Pe (16) 

P e££ 

s.t. Tr (PP T ) = p. 

In the following, we give the proof that the program in (16) 
is NP-hard with respect to the dimension m of the signaling 
vector, S € R m , for the case where the set £ is considered to 
be unstructured (i.e., not constrained to be a difference set). We 
are now preparing the proof without this assumption in [14]. 
The proof is based on a series of Cook reductions. We say that 

B, if 



Cook 



program A can be Cook reduced to program B, A 
program A can be computed with a polynomial time algorithm 
that calls program B as a subroutine assuming that the call is 
performed in one clock cycle. We have that, if A C °° K > B and 
A is NP-hard, then B is also in NP-hard, [11]. 

Before giving the actual proof we describe two more 
programs and give some of their properties. 

A. Intermediate programs and their properties 

We first present the MinNorm program, which computes the 
minimum norm vector that fulfills a set of constraints on its 
scalar product with a given set of vectors {w,}™ 1 

{t*, z*} = MinNorm ( {wj™ J 

2 (17) 



:= mm 

z£R ro 



S.t. | Wi 'z| > 1, 
Lemma 3 ([17]): MinNorm is NP-hard. 



1, . . . ,m. 



Algorithm 1 Reduction of MinNorm to MinPower 

Input : Set of weight vectors {wj^j. 
Output : Vector z* that achieves the minimum norm, fulfill- 
ing all the constraints |w^z*| > 1. 
Value of the minimum norm t* = ||z*|| 2 . 
Assign H= ( 1 ... ) e R lxp . 
Assign £ = {wi, . . . ,w m }. 
Call {p*,P*} = MinPower(l,£,H). 
t* = p*. 

z* = (FirstRow(P*)) T . 



The second problem is MinPower and it computes the 
precoder that minimizes the transmitted power such that the 
minimum distance is above a certain threshold: 



{p*, P*} = MinPower (d, £, H) 



(18) 



min Tr PP ' 

p v ' 

s.t. min e T P T R H Pe > d. 



Lemma 4: Assume that {c£q,Pq} is the output to the pro- 
gram MaxMinDist( ( o ,f , H). It then follows that the output 
to MinPower (g?q, £, H) is given by {po,P^}. 

Similarly, assume that {pQ 7 P^} is the output to the pro- 
gram MinPower (do, £,H). It then follows that the output to 
MaxMinDist(pQ,£,H) is given by {d ,Po}. 

Proof: See [18]. ■ 

Lemma 5: Assume that {c£q,Pq} is the output to the pro- 
gram MaxMinDist(po,£,H). It then follows that the out- 
put to MaxMinDist(apo, £ , H) with a > is given by 

{adlV^K}- 

Proof: The proof follows easily, e.g., by considering the 

change of optimization variable P = ^/aP and noting that 

the solution to the optimization problem remains unchanged 

if the objective function is scaled by a constant parameter. ■ 

In the following we prove the following chain of reductions: 

Cook „. _ Cook „ „. _. 
MinNorm > MinPower ► MaxMmDist. 

B. Reduction of MinNorm to MinPower 

In Algorithm 1 we present our proposed Cook reduction of 
MinNorm to MinPower. 

Proposition 3: Algorithm 1 is a polynomial time Cook 
reduction of MinNorm to MinPower. 

Proof: Under the assumption that MinPower can be 
solved in one clock cycle, it follows that Algorithm 1 runs in 
polynomial time as well. It remains to check that the output 
of the algorithm corresponds to the solution to MinNorm. 

Note that for the particular values assigned to the channel 
matrix H and the set £ in Steps 1 and 2 in Algorithm 1, the 
program MinPower(l, £, H) in (18) particularizes to 

min Tr (PP T ) (19) 



s.t. min wjpipj w, > 1, 

i£[l,m] 



(20) 



Although we use the symbol d m i n , it denotes squared distance. 



where pi is a column vector with the elements of the first row 
of the precoder matrix P. Observing that the constraint in (20) 



Algorithm 2 Reduction of MinPower to MaxMinDist 
Input : Desired squared minimum distance, d. 
Set of vectors £. 
Channel matrix, H. 
Output : Precoder P* that minimizes the transmitted power, 
fulfilling min ee<£ e T P* T R H P*e > d. 
Transmitted power p* = Tr (P*P* T ). 
1: Call {d£,Pg} = MaxMinDist(l,£,H). 
2: Assign p* = 

3: Assign P* = y^§P<5- 



only affects the elements of the first row of matrix P, it is clear 
that the optimal solution to (19) fulfills [P*] y = 0, Vi ^ 1, 
as this assignment minimizes the transmitted power. Recalling 
that wJpip^Wi = |wjpi| 2 , it is now straightforward to see 
that the first row of matrix P*, which is the solution to the 
problem in (19), is also the solution to MinNorm in (17). ■ 
Corollary 1: For the case where the set £ is unconstrained, 
the program MinPower is NP-hard. 

C. Reduction of MinPower to MaxMinDist 

In Algorithm 2 we present our proposed Cook reduction of 
MinPower to MaxMinDist. 

Proposition 4: Algorithm 2 is a polynomial time Cook 
reduction of MinPower to MaxMinDist. 

Proof: Under the assumption that MaxMinDist can be 
solved in one clock cycle, it follows that Algorithm 2 runs in 
polynomial time as well. It remains to check that the output 
of the algorithm corresponds to the solution to MinPower. 

Assume that the output to MaxMinDist(l, £, H) is given 
by {dg,Pg} as in Step 1 in Algorithm 2. Note that, from the 
power constraint in (16), we have that Tr (PqP($ T ) = !• From 
Lemma 5, choosing a = d/d^, it follows that 

jrf, ^rf/^Poj = MaxMinDist (d/d* ,£, H) . (21) 

Now, applying Lemma 4, we have that 

\d/dl^d/d*T>l^ =MinPower(d,£,H), (22) 

from which it immediately follows that p* = d/d^ and P* = 
y/d/d^P^, which completes the proof. ■ 

Corollary 2: For the case where the set £ is unconstrained, 
the program MaxMinDist is NP-hard. 

Although the fact that the program MaxMinDist is NP-hard 
is not a proof that the maximization of the mutual information 
is also NP-hard, it gives a powerful hint on its expected 
computational complexity in the high SNR regime where the 
minimum distance is the key performance parameter. 

From this expected complexity on the precoder design at 
high SNR and the fact that, in Section III, we characterized 
the optimal left singular vectors and the singular values of the 
precoder that maximizes the mutual information as a function 



of the right singular vector matrix Vp, it seems reasonable 
to place the computational complexity burden of the optimal 
precoder design in the computation of Vp. 

VII. Conclusion 

We have studied the problem of finding the precoder that 
maximizes the mutual information for an arbitrary (but given) 
input distribution. We have found a closed-form expression for 
the left singular vectors of the optimal precoder and have given 
a sufficient and necessary condition to compute the optimal 
singular values. We have also recalled that, in the low SNR 
or Gaussian signaling scenarios, the optimal precoder can be 
easily found as the mutual information does not depend on the 
right singular vectors. Finally, we have argued that in the high 
SNR regime, the computational complexity of the calculation 
of the optimal right singular vectors is expected to be hard. 
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